NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Label poisoning is all you need

Liu, Xiyang; Jain, Prateek; Kong, Weihao; Oh, Sewoong; Suggala, Arun (December 2024, In 37th Conference on Neural Information Processing Systems (NeurIPS). Advances in Neural Information Processing Systems)

Full Text Available
Insufficient Statistics Perturbation: Stable Estimators for Private Least Squares

Brown, Gavin; Hayase, Jonathan; Hopkins, Samuel; Kong, Weihao; Liu, Xiyang; Oh, Sewoong; Perdomo, Juan C; Smith, Adam (June 2024, Conference on Learning Theory. The 37th Annual Conference on Learning Theory (COLT 2024))

We present a sample- and time-efficient differentially private algorithm for ordinary least squares, with error that depends linearly on the dimension and is independent of the condition number of X⊤X, where X is the design matrix. All prior private algorithms for this task require either d3/2 examples, error growing polynomially with the condition number, or exponential time. Our near-optimal accuracy guarantee holds for any dataset with bounded statistical leverage and bounded residuals. Technically, we build on the approach of Brown et al. (2023) for private mean estimation, adding scaled noise to a carefully designed stable nonprivate estimator of the empirical regression vector.
more » « less
Full Text Available
Insufficient Statistics Perturbation: Stable Estimators for Private Least Squares

Brown, Gavin; Hayase, Jonathan; Hopkins, Samuel; Kong, Weihao; Liu, Xiyang; Oh, Sewoong; Perdomo, Juan C; Smith, Adam (June 2024, Conference on Learning Theory)
Insufficient Statistics Perturbation: Stable Estimators for Private Least Squares

Brown, Gavin; Hayase, Jonathan; Hopkins, Samuel; Kong, Weihao; Liu, Xiyang; Oh, Sewoong; Perdomo, Juan C; Smith, Adam (June 2024, 37th Annual Conference on Learning Theory (COLT 2024))

Full Text Available
Fisher-Pitman Permutation Tests Based on Nonparametric Poisson Mixtures with Application to Single Cell Genomics

https://doi.org/10.1080/01621459.2022.2120401

Miao, Zhen; Kong, Weihao; Vinayak, Ramya Korlakai; Sun, Wei; Han, Fang (January 2024, Journal of the American Statistical Association)

Full Text Available
Label Robust and Differentially Private Linear Regression: Computational and Statistical Efficiency

Liu, Xiyang; Jain, Prateek; Kong, Weihao; Oh, Sewoong; Suggala, Arun (December 2023, In 37th Conference on Neural Information Processing Systems (NeurIPS). Advances in Neural Information Processing Systems)

Full Text Available
Robust and differentially private mean estimation

Liu, Xiyang; Kong, Weihao; Kakade, Sham; Oh, Sewoong (January 2021, Advances in neural information processing systems)

Full Text Available
Robust and differentially private mean estimation

Liu, Xiyang; Kong, Weihao; Kakade, Sham; Oh, Sewoong (January 2021, Advances in neural information processing systems)

Full Text Available
Meta-learning for Mixed Linear Regression

Kong, Weihao; Somani, Raghav; Song, Zhao; Kakade, Sham; Oh, Sewoong (July 2020, International Conference on Machine Learning)
null (Ed.)
n modern supervised learning, there are a large number of tasks, but many of them are associated with only a small amount of labelled data. These include data from medical image processing and robotic interaction. Even though each individual task cannot be meaningfully trained in isolation, one seeks to meta-learn across the tasks from past experiences by exploiting some similarities. We study a fundamental question of interest: When can abundant tasks with small data compensate for lack of tasks with big data? We focus on a canonical scenario where each task is drawn from a mixture of k linear regressions, and identify sufficient conditions for such a graceful exchange to hold; there is little loss in sample complexity even when we only have access to small data tasks. To this end, we introduce a novel spectral approach and show that we can efficiently utilize small data tasks with the help of Omega(k^3/2) medium data tasks each with Omega(k^1/2) examples.
more » « less
Full Text Available
Robust Meta-learning for Mixed Linear Regression with Small Batches

Kong, Weihao; Somani, Raghav; Kakade, Sham; Oh, Sewoong (January 2020, Advances in neural information processing systems)

A common challenge faced in practical supervised learning, such as medical image processing and robotic interactions, is that there are plenty of tasks but each task cannot afford to collect enough labeled examples to be learned in isolation. However, by exploiting the similarities across those tasks, one can hope to overcome such data scarcity. Under a canonical scenario where each task is drawn from a mixture of k linear regressions, we study a fundamental question: can abundant small-data tasks compensate for the lack of big-data tasks? Existing second moment based approaches show that such a trade-off is efficiently achievable, with the help of medium-sized tasks with k^1/2 examples each. However, this algorithm is brittle in two important scenarios. The predictions can be arbitrarily bad even with only a few outliers in the dataset; or even if the medium-sized tasks are slightly smaller with. We introduce a spectral approach that is simultaneously robust under both scenarios. To this end, we first design a novel outlier-robust principal component analysis algorithm that achieves an optimal accuracy. This is followed by a sum-of-squares algorithm to exploit the information from higher order moments. Together, this approach is robust against outliers and achieves a graceful statistical trade-off.
more » « less
Full Text Available

« Prev Next »

Search for: All records